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LORD-WINGERSKY ALGORITHM VERSION 2.0 FOR HIERARCHICAL ITEM 


FACTOR MODELS WITH APPLICATIONS IN TEST SCORING, SCALE 
ALIGNMENT, AND MODEL FIT TESTING 
Li Cai 

University of California, Los Angeles 

Abstract 

Lord and Wingersky’s (1984) recursive algorithm for creating summed score based 
likelihoods and posteriors has a proven track record in unidimensional item response theory 
(IRT) applications. Extending the recursive algorithm to handle multidimensionality is 
relatively simple, especially with fixed quadrature because the recursions can be defined on a 
grid formed by direct products of quadrature points. However, the increase in computational 
burden remains exponential in the number of dimensions, making the implementation of the 
recursive algorithm cumbersome for truly high dimensional models. In this paper, a 
dimension reduction method that is specific to the Lord-Wingersky recursions is developed. 

This method can take advantage of the restrictions implied by hierarchical item factor models 
(e.g., the bifactor model [Gibbons & Hedeker, 1992], the testlet model [Wainer, Bradlow, & 
Wang, 2007], or the two-tier model [Cai, 2010b], such that a version of the Lord-Wingersky 
recursive algorithm can operate on a dramatically reduced set of quadrature points. For 
instance, in a bifactor model, the dimension of integration is always equal to 2, regardless of 
the number of factors. The new algorithm not only provides an effective mechanism to 
produce summed score to IRT scaled score translation tables properly adjusted for residual 
dependence, but leads to new applications in test scoring, linking, and model fit checking as 
well. Simulated and empirical examples are used to illustrate the new applications. 

Introduction 

The paper by Lord and Wingersky (1984) contains a terse description of a remarkably 
elegant recursive algorithm for computing summed score based likelihoods from the perspective 
of item response theory (IRT). According to Google Scholar, the paper has only been a moderate 
success in terms of citation counts (over 137 times as of this writing). However, the Lord- 
Wingersky algorithm motivated a number of important developments in educational and 
psychological measurement. For example, Thissen, Pommerich, Billeaud, and Williams (1995) 
extended the algorithm to test scoring with ordered polytomous IRT models. Thissen and Wainer 
(2001) presented a detailed account of related summed score based methods for test scoring 
using IRT, including methods for mixed-format tests involving a combination of multiple-choice 
and constructed response items. Orlando, Sherboume, and Thissen (2000) applied the Lord- 
Wingersky algorithm to summed score based test linking. Chen and Thissen (1999) derived an 
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item parameter calibration method based on summed scores. Orlando and Thissen (2000) 
proposed a solution to the item fit testing problem with a slight alteration of the original Lord- 
Wingersky algorithm. 

Multidimensional IRT has flourished in recent years (e.g., Reckase, 2009). In particular, 
full-information item factor analysis (Bock, Gibbons, & Muraki, 1988) has become one of the 
central methodological pillars in educational and psychological measurement research (see a 
recent review by Wirth & Edwards, 2007). As IRT becomes adopted in new fields such as 
health-related patient reported outcomes measurement (see Reeve et al., 2007), new item 
parameter estimation algorithms (e.g., Cai, 2010a; Edwards, 2010; Schilling & Bock, 2005) and 
flexible software implementations (e.g., Cai, 2012; Cai, Thissen, & du Toit, 2011; Wu & Bentler, 
2011) have emerged. 

One particular kind of confirmatory item factor analysis, full-information item bifactor 
analysis, has caught special attention among psychometric researchers (Gibbons & Hedeker, 
1992). In an item bifactor model, all items load on a general dimension, and an item is permitted 
to load on at most one specific dimension. The specific dimensions are in essence group factors 
that account for residual dependence above and beyond the general dimension. The factor pattern 
in a bifactor analysis is an example of the hierarchical factor solution (Holzinger & Swineford, 
1937; Schmid & Leiman, 1957). 

The popularity of the item bifactor model has been, in no small part, due to Gibbons and 
Hedeker’s (1992) discovery of a dimension reduction method. With dimension reduction, 
maximum marginal likelihood estimation of item bifactor models requires at most 2-dimensional 
numerical quadrature, irrespective of the number of factors in the model. Thus, truly high- 
dimensional confirmatory factor models may be fitted to item response data with reasonable 
numerical accuracy, computational stability, and most importantly, within a reasonable amount 
of time. Gibbons and Hedeker’s (1992) dimension reduction method did much to free item factor 
analysis from the “curse” of dimensionality. 

The computational efficiency of the hierarchical item factor formulation prompted a flurry 
of recent activities in the technical literature (e.g., Gibbons et al., 2007; Jeon, Rijmen, & Rabe- 
Hesketh, 2013; Rijmen, Vansteelandt, & De Boeck, 2008; Rijmen, 2009), where new 
computational methods and extensions of the basic bifactor model are presented (see, e.g., Cai, 
2010b; Cai, Yang, & Hansen, 2011). Within educational measurement, the closely related testlet 
response theory model (Wainer, Bradlow, & Wang, 2007) also garnered much attention. The 
testlet response theory model is a second-order item factor analysis model, but it is typically 
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shown as a constrained version of item bifactor model (Glas, Wainer, & Bradlow, 2000; Li, Bolt, 
& Fu, 2006; Rijmen, 2010; Yung, McLeod, & Thissen, 1999). 

Renewed interest in the hierarchical item factor model brings new methodological 
questions. As Reise (2012) noted, the bifactor model is appealing because it offers a convenient 
mechanism to accommodate nuisance multidimensionality without sacrificing the interpretability 
of the general dimension, which ultimately represents the target latent construct being measured, 
in contrast to other multidimensional IRT models (e.g. with multiple correlated latent variables). 
The existence of unequivocal general dimension(s) and the continued prevalence of summed 
scoring of assessment instruments imply that there is much theoretical and applied interest in 
being able to characterize the relation between observed summed scores and the general 
dimension(s), which calls for an extension of the classical Lord-Wingersky algorithm to the case 
of hierarchical item factor analysis models. 

Even as one may extend the Lord-Wingersky algorithm to standard multidimensional IRT 
models using direct product quadrature rules, the computational complexity increases 
exponentially as more factors are added into the model. Therefore a different strategy is required 
- a strategy that efficiently utilizes the restrictions implied by the hierarchical item factor 
analysis model to achieve dimension reduction analytically. The combination of Lord-Wingersky 
recursions with analytical dimension reduction results in what amounts to version 2.0 of the 
Lord-Wingersky algorithm. Its details will be the one of the foci of this paper. 

With the availability of such an algorithm, a number of technical issues can be resolved. 
First, when multidimensional bifactor or testlet structures demonstrate superior fit to calibration 
data than the single-factor model, one can now construct summed score to IRT scaled score 
translation tables properly adjusted for residual dependence. Second, in terms of test linking, one 
can also achieve more than an extension of Orlando et al.’s (2000) summed-score based method 
for linking distinct groups. Thissen, Varni, Stucky, Liu, Irwin, and DeWalt’s (2011) calibrated 
projection method utilized two correlated general dimensions in a two-tier item factor model 
(Cai, 2010b) to produce the summed score to scaled score conversion table so that two closely 
related (yet not identical) instruments can be linked together with the method of projection. 
Third, the score combination methods for mixed format tests described by Rosa, Swygert, 
Nelson, and Thissen (2001) can be obtained as a by-product of the Lord-Wingersky 2.0 
algorithm, with no specialized computation required. Last but not the least, summed score 
computations can be useful for model fit checking. For instance, Orlando and Thissen’s (2000) 
highly successful summed score based item fit statistic (S-X ) can be extended to test item fit for 
bifactor models. The model-implied and observed summed score probabilities can also form 
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diagnostic indices to check the ubiquitous latent variable normality assumption. The remainder 
of this paper will discuss each of the above applications in turn. 

The Original Lord-Wingersky Algorithm 

Summed Score Likelihoods 


Let there be a total of i = 1, ...I ordinal items. Let 7)(/c |0) be the ith item’s traceline for 
category k = 0,1, — 1. The summed scores range from 0 to S = £/ =1 (/Q — 1). From the 

perspective of IRT, the likelihood for the response pattern u = (it 1; ... it/) can be expressed as 

1 (1) 
L(u\0) — n Tt(Ui\e), 

i= 1 


due to the assumption of independence of item responses conditional on the latent trait 0. Define 
||u|| = Xi=i u t as a notational shorthand for the summed score associated with response pattern 
u. The likelihood for summed score s = 0 , ... , S is defined as 

1 ( 2 ) 


L(s\0)= ^ L(u\0) = ^ J^7)(iq|0), 


s=\\u\\ s=INI i=i 

where the summation in Equation (2) is over all such response patterns that lead to a summed 
score equal to s. Given a population (prior) distribution g(d), the unnormalized posterior for 
summed score s is 

P(.9\s) « L(s\0)g(0), 


and the (marginal) probability for summed score s is 

p (s') = j L(s\0)g(0 ) d6, 

which implies that the normalized posterior of summed score s is 

L(s\e)g(d) 


P(9\s ) 


(3) 


(4) 


(5) 


pO) 


Therefore, the posterior mean is 

E(e\s) = -!- f eL(s\e)g(e)de, 

P(s)J 

and the posterior variance is 

V(0|s) = E(0 2 \s) -E 2 (0\s) = — !— [ 0 2 L(s\0)g(0)d0 -E 2 (0\s). 

pCvJ 


( 6 ) 


(7) 


4 



The posterior mean and the square root of the posterior variance may be taken as the point 
estimate and the standard error of measurement for 6. The marginal probability, posterior mean, 
and posterior variance for the summed scores are key estimands that the IRT model can generate 
as long as the categories are ordered to allow for an approximate monotonic relationship between 
summed scores and scaled scores. 

Dichotomous Item Responses 

It is more convenient to introduce the Lord-Wingersky algorithm for dichotomously scored 
items. The extension to polytomous data is straightforward (as shown in this paper’s Polytomous 
Item Responses section). For now, all s are taken to be identically equal to 2. In this case, the 
maximum summed score S is equal to the number of items I. The definition in Equation (2) 
requires evaluating all 2 1 response pattern likelihoods, which becomes computationally 
intractable when / is large. On the other hand, Lord and Wingersky’s (1984) algorithm builds the 
summed score likelihoods recursively, one item at a time. Let Lj(s|0) denote the likelihood for 
summed score s, after item i has been added into the computation. 

The algorithm starts by initializing the summed score likelihoods from item 1. As such, 
there are two possibilities L 1 (O|0) = 7\( O|0) and L 1 (1|0) = 1|0) at the end of Step 1. Next, 

the second item is added. Note that at the end of the second step there will be three summed 
scores. The likelihood for summed score 0 is L 2 (O|0) — L 1 (O|0)7’ 2 (O|0). The likelihood for 
summed score 1 is a combination of two distinct possibilities: L 2 ( 1|0) = L 1 (1|0)T 2 (O|0) + 
L 1 (O|0)T 2 (1|0). The likelihood for summed score 2 is L 2 (2|0) = L 1 (1|0)T 2 (1|0). Then, in 
Step 3, item 3 is added. The likelihood for summed score 0 is L 3 (O|0) = L 2 (O|0)T 3 (O|0). The 
likelihood for summed score 1 is: L 3 (1|0) = L 2 (1|0)T 3 (O|0) + L 2 (O|0)T 3 (1|0). The likelihood 
for summed score 2 is: L 3 (2|0) = L 2 (2|0)T 3 (O|0) + L 2 (1|0)T 3 (1|0). Linally, the likelihood for 
summed score 3 is L 3 (3|0) = L 2 (2|0)T 3 (1|0). More generally, after initialization at item 1, in 
Step i of the recursive algorithm, item i — 2,..., I is added into the existing summed score 
likelihoods according to the following rules: 

Li(.0\9) — L;_i(O|0)7)(O|0), (8) 

Li(s\e) = I'i- 1 O>|0)7’j(O|0) + Li_x(s - 1|0)7X1|0), for s = 1, ... i - 1, 
and Lt(i\0) = ^.,0 -nmcm- 

The recursion is repeated until all I items have been added. At the end of the recursions, each 
accumulated L ; (s|0) will be equal to the summed score likelihood L(.v|0) defined earlier in 
Equation (2). As one can see, the recursive algorithm does not require explicitly enumerating all 
2 1 response pattern l ik elihoods. 
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In practice, because the integrals in Equations (4), (6), and (7) cannot be solved 
analytically, it is necessary to evaluate the summed score likelihoods over a set of quadrature 
points so that numerical summaries of the posterior can be computed. For instance, the marginal 
probability can be approximated to arbitrary precision using a (7-point rule: 


Q 

P(s) = J L(s\0M0)d0 ^^L(slX q )W(X q ), 

q = l 


(9) 


where X q is a quadrature node and W (X q ) is the corresponding quadrature weight. Gauss- 
Hermite quadrature is used extensively in the literature because the prior distribution of 6 is 
typically assumed to be Gaussian. However, for simplicity, rectangular quadrature may be used, 
where W(X q ) is a set of normalized ordinates of the prior density, i.e., W(X q J = g(X q )/ 
Hq=i g(X q ), and the quadrature nodes are chosen to represent a sufficiently fine grid over an 
interval that captures most of the probability mass of the posterior (e.g., from -4 to +4), for a 
standard Gaussian prior. 


An Illustrative Example 

It is instructive to consider a simple test with 3 dichotomous items. The item tracelines are 
characterized by the 2-parameter logistic model: 


7iUl0) = 


1 

1 + exp [— (q + aj0)]' 


( 10 ) 


for the correct/endorsement response ( k — 1), where c* and a L are the item intercept and slope 
parameters. The incorrect/non-endorsement response (k — 0) has a traceline that is equal to 
7HO|0) = 1.0-7X110). The intercept parameters for the 3 items are -1.0, -0.2, and 0.6, 
respectively. The slope parameters are 1.2, 1.0, and 0.8, respectively. 
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Table 1 


Ordinates of item response functions and quadrature weights evaluated at 5 
rectangular quadrature points for the 3 hypothetical items in the example 


9 

-2.00 

-1.00 

0.00 

1.00 

2.00 

73 ( 110 ) 

.032 

.100 

.269 

.550 

.802 

73 (10) 

.010 

.232 

.450 

.690 

.858 

73 ( 110 ) 

.269 

.450 

.646 

.802 

.900 

73 (0 1 0) 

.968 

.900 

.731 

.450 

.198 

73(010) 

.900 

.769 

.550 

.310 

.142 

73(010) 

.731 

.550 

.354 

.198 

.100 

W{9) 

.054 

.244 

.403 

.244 

.054 


Table 1 shows the values of the tracelines evaluated at 5 equally-spaced quadrature points 
at 9 levels -2, -1, 0, 1, and 2, as well as the corresponding quadrature weights at each point. The 
quadrature weights are normalized ordinates of a standard Gaussian prior density for 6 . Based on 
the item tracelines and weights in Table 1, one can apply the Lord-Wingersky algorithm to 
recursively accumulate the 4 summed score likelihoods (0, 1, 2, 3) for the 3 dichotomously 
scored items. Table 2 shows the recursive computations in some detail. As one can see, after the 
initializations in Step 1, the recursive algorithm follows Equation (8) until all items have been 
added. The set of 4 summed score likelihoods at the end of Step 3 are represented numerically at 
the specified quadrature points. Of course, in practice, many more quadrature points are used for 
better precision. Table 2 serves as an illustration similar to Thissen and Wainer’s (2001) Table 
3.8 (p. 124). 
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Table 2 


Accumulating the summed score likelihoods at 5 rectangular quadrature points for the 3 hypothetical items with 
Lord-Wingersky algorithm 


Summed score 
likelihoods 

0 

-2 

-1 

0 

1 

2 

Step 1 : Initialize summed score likelihoods by adding Item 1 

£i(O|0) = 

73(0 10) 

.968 

.900 

.731 

.450 

.198 

Li(l|0) = 

73(110) 

.032 

.100 

.269 

.550 

.802 

Step 2: Add Item 2 to existing summed score likelihoods 

L 2 ( O|0) = 

I-i (0| 0)T 2 (O|0) 

.871 

.692 

.402 

.140 

.028 

L 2 (1|0) = 

^(110)73(010) +^(010)73(110) 

.126 

.285 

.477 

.481 

.284 

L z (2|0) = 

L 1 (l|0)7’ 2 (l|0) 

.003 

.023 

.121 

.379 

.688 

Step 3: Add Item 3 to existing summed score likelihoods 

II 

OS 

o 

cn 

II 

ST 

o 

L 2 (010)73(010) 

.637 

.380 

.142 

.028 

.002 

L(1|0) = L 3 (1|0) = 

L 2 (1|0)7’ 3 (O|0)+L 2 (O|0)7’ 3 (1|0) 

.326 

.468 

.429 

.207 

.053 

L(2|0) = L 3 (2|0) = 

L 2 (2|0)7’ 3 (O|0)+L 2 (1|0)7’ 3 (1|0) 

.036 

.141 

.351 

.461 

.324 

L(3\0) =L 3 (3\0) = 

L 2 (2|0)73(l|0) 

.001 

.010 

.078 

.304 

.620 


With the quadrature weights in Table 1 and the summed score likelihoods in Table 2, one 
may directly compute the unnormalized summed score posteriors according to Equation (3) by 
multiplying the summed score likelihood L(s|0) with the prior weight W (6) at each of the 
chosen quadrature points. Table 3 shows the posterior computations in detail. The unnormalized 
summed score posteriors are found by multiplying (point-by-point) the values of the summed 
score likelihoods (the second panel) with the corresponding quadrature weights (the first panel). 
Summing over the quadrature representation of the unnormalized summed score posterior, as per 
Equation (9), the marginal probabilities of the summed scores are shown in Table 3 under the 
column heading p(s). These are the IRT model-implied probabilities for each of the summed 
scores. The posterior means E(8\s) and posterior variances V ( 8 |s) are also presented in Table 3, 
essentially in the form of a summed score to IRT scaled score translation table. For instance, a 
summed score of 0 can be translated to an IRT scaled score of -.85 with standard error equal to 
the square root of .67. The probabilities can be used to construct percentile tables. Tables such as 
this facilitate the adoption of IRT scoring in practical situations. 



Table 3 


Characterizing the summed score likelihoods and posteriors using the representation at 5 rectangular quadrature 
points for the 3 hypothetical items 


Quadrature 
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Posterior summaries 

Weights at 

-2 

-1 

0 

1 

2 




IT(6») = 

.054 

.244 

.403 

.244 

.054 




Summed score 



9 






Likelihoods L(s|0) at 

-2 

-1 

0 

1 

2 




um = 

.637 

.380 

.142 

.028 

.002 




um = 

.326 

.468 

.429 

.207 

.053 




L(2|0) = 

.036 

.141 

.351 

.461 

.324 




L(3|0) = 

.001 

.010 

.078 

.304 

.620 




Unnormalized summed 
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Score posteriors p(0|s) at 

-2 

-1 

0 

1 

2 

p(s) 

E(9\s) 

F(0|s) 

p(6»|0) oc L(0\9)W(9) = 

.035 

.093 

.057 

.007 

.000 

.19 

-.81 

.59 

p(6»|l) oc L(1|6»)IT(6») = 

.018 

.114 

.173 

.051 

.003 

.36 

-.26 

.62 

p(6»|2) oc L(2|6»)IT(6») = 

.002 

.034 

.141 

.113 

.018 

.31 

.36 

.61 

p(6»|3) oc L(3|6»)IT(6») = 

.000 

.003 

.031 

.074 

.034 

.14 

.98 

.53 


Marginal Reliability of Scaled Scores 

With the summed score to scaled score conversion table, a kind of marginal reliability 
coefficient can be computed for the scaled scores. Let V (6) denote the average error variance 
associated with 9. It may be obtained from the conversion table as a weighted sum 

S (11) 


n0) = ^V(0is)p(s). 


S — 0 


The marginal reliability of the scaled score conversions is defined as 

_ _ V(9) (12) 

p a 2 (ey 

where er 2 (0) is the total (prior) variance of 9. From the results in Table 3, the average error 
variance is equal to 0.64. Since the latent trait 9 has an assumed standard normal prior 
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distribution, the total variance is 1.0. The marginal reliability of the scaled scores based on the 
summed scores is therefore equal to 0.36. 

Polytomous Item Responses 

Recall that T i {k\6') is the zth item’s traceline for category k — 0, 1, ... , K, — 1, and the 
number of categories (K L > 2) may be different across items. Define S, = Yj)=\ (Kj — 1) as a 

notational shorthand for the maximum summed score after item i has been included. Clearly the 
maximum summed score is 5 — Sj. 

The first step of the algorithm still involves the initialization of the summed score 
likelihoods at the category tracelines of item 1 so that = T 1 (s|0) for s = 0, In 

Step i — 2, ... , /, the category tracelines of item i are added into the S [ _ l available summed score 
likelihoods from the previous step, similar to the dichotomous case, but more complex book- 
keeping is required since the number of combinations leading up to the same summed score 
increases as the number of categories increases. For item i with K, categories, and summed score 
s = 0, ... , Si, the summed score likelihood can be written as 

Si-1 Ki (13) 

L t (s\d ) = II Li_i(s,\e)Ti(k\e) l s (s* + k), 

s*=0 k = o 

where l s (s* + k ) is an indicator function that takes on a value of 1 if and only if s is equal to 
s* + k , and 0 otherwise. The summation in Equation (13) is over the existing summed score 
likelihoods and categories of item i, while preserving the restriction that the combination must 
lead to a summed score equal to s. Equation (13) reduces to the recursions in Equation (8) when 
all items are dichotomous. After all / items have been added, L;(5|0) will become the desired 
summed score likelihood L(s\0) for summed score s = 0, ... , S. 

Lord-Wingersky Algorithm Version 2.0 
A General Hierarchical Item Factor Model 

Cai’s (2010b) two-tier model represents a general hierarchical model that includes the 
standard (correlated-traits) multidimensional IRT model, item bifactor model, and testlet 
response theory models as special cases. In this model, two kinds of latent variables are 
considered, primary and specific. This creates a partitioning of B into two mutually exclusive 
parts: 6 = (rj, ^). where T] is an M-dimensional vector of (potentially correlated) primary latent 
dimensions and f is an A-dimensional vector of (mutually orthogonal) specific latent dimensions 
that are orthogonal to the primary dimensions. In the two-tier model, an item is allowed to load 
on all M primary dimensions in any identified manner and at most 1 specific dimension. Using a 
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path diagram, Figure 1 shows a hypothetical two-tier model with 20 items (the rectangles) that 
load on M = 2 primary dimensions that are correlated, as well as N = 4 specific dimensions. 
Obviously, a two-tier model with only 1 primary dimension becomes a bifactor or a testlet 
model. 



Figure 1 . Path diagram of a two-tier model with 2 correlated primary dimensions and 4 specific dimensions. 


Without loss of generality, let T i (k\Q') be the ith item’s traceline (or perhaps more properly 
referred to as trace-surface for multidimensional 0 ) for category k. In principle, the Lord- 
Wingersky algorithm can be defined on a set of quadrature points that are formed by direct- 
products of unidimensional quadrature points. This leads to an exponentially increasing amount 
of computation in the number of latent dimensions. Fortunately, the two-tier formulation leads to 
a computational short cut that circumvents the integration problem. This is the main result of the 
paper. 

General Approach 

In the two-tier model, the item trace-surface T i (k\Q') can be redefined as T”(/c|t 7 , £) = 
T?(k\n> (f n ), for item i that loads on specific dimension n. The last equality comes from the fact 
that an item is permitted to load on at most one specific dimension, say, in a two-tier model. If 
an item does not load on any specific dimension, it may be conveniently grouped with the first 
item cluster for the purposes of summed score computations and no generality is lost. Let there 
be I n items that load on specific dimension % n . As such, these I n items form a testlet or item 
cluster that may be residually dependent after accounting for rj. For a two-tier model, the 
likelihood for response pattern u can be expressed as 
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N l n 


( 14 ) 


L(u\0) = L(_u\r],0 = 

n= 1 i= 1 


where uf is the response to item i in item cluster n. Let g n (^ n ) be the density function of the nth 
specific dimension. Integrating out the dependence on the likelihood of 77 based on pattern u 
can be written as 


N 1 n 

L(u\lj) = J ■■■ j 


n= 1 i= 1 


ShOfi) SlvC^v)^! d% N 


N In 

= Y [ In [ T i l( < U i^’ fn) dn^n)d^ n , 

n = 1 i= 1 


(15) 


where the second line in Equation (15) have utilized the two-tier model assumption of the 
independence of the specific dimensions, thereby transforming the original iV-fold multiple 
integral on the first line into a product of N one-fold integrals. This is the same derivation as the 
dimension reduction procedure in maximum marginal likelihood item parameter estimation for 
two-tier or bifactor/testlet models (see, e.g., Cai, 2010b). Let 


L n (. U n \ T ]) = j 9n ^ n ) d^ n 

i= 1 


(16) 


denote the likelihood of T] based on the subset of responses u n — (it", , u" n ) in the nth item 
cluster such that u = (m 1; ...u n , ... , u N ). The l ik elihood of 17 for summed score 5 can be written 
as 


N 

= ^ ' L(u\ Y \) = in L n (u 




s=IMI 


5 = u n-1 


(17) 


which is entirely analogous to Equation (2). Integrating over 77 , the marginal probability is 
p(s) = J L (.v 1 77) h (77)1/77 (cf. Equation 4), where h(j]) is the density of the primary dimensions, 
and the summed score posterior is p (77 1 s) = L (.s 1 77) /p (s) (cf. Equation 5). 

The dominating insight from Equation (17) is that conditional on the general dimension(s), 
the testlets or item clusters become the fungible units of model building and computation, just as 
items are the fungible units in the standard Lord- Winger sky recursions. All that is required is an 
extra stage of recursions. In the first stage, for the nth item cluster, likelihoods for the within- 
cluster summed scores are accumulated over the latent variable space spanned by (77, ( n ). This is 
standard Lord-Wingersky algorithm as applied to the items in cluster n on a set of direct product 
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quadrature points spanning the space of For each within-cluster summed score 

likelihood, the dependence on the specific dimension is subsequently integrated out, leaving 
the within-cluster summed score likelihoods as functions of the general dimension(s) T] alone. In 
the second stage, the N clusters are treated as N multiple-category items, and the within-cluster 
summed score likelihoods from the first stage are treated as if they are category tracelines 
defined on tj. Standard Lord-Wingerksy algorithm for polytomous IRT is applied to accumulate 
the final summed score likelihoods. 


Details of the Lord-Wingersky 2.0 Algorithm 


To avoid notational clutter, it would be convenient to introduce the new Lord-Wingersky 
algorithm for hierarchical item factor models using one of the simplest two-tier models, namely, 
the logistic item bifactor model for dichotomous responses. In this case, T{ L (k\T] l Z; n ) reduces 
further to T^ik^, f n ), and 77 represents the single general dimension. The IRT model for the 
correct/endorsement response can be written as 


r t n (i|e) = r t n (i|i7^ n ) 


1 

1 + exp[— (q + a t °77 + a?f n )]' 


( 18 ) 


Note that there are two slope parameters per item in the bifactor model (cf. Equation 10 ). The 
slope for the general dimension is a® and the slope for the nth specific dimension is af. The item 
intercept continues to be denoted as c L . 

With no loss of generality, consider the nth item cluster. The first stage of Lord-Wingersky 
algorithm 2.0 starts with the initialization of the within-cluster summed score likelihood: 
L?( 0 |j 7 ,^n) ~ 7 i n ( 0 | 77, <f n ) and f-i(l| 77 , £ n ) = T™(l\r 7, £ n ). Then, each of the remaining items 
within the cluster is added to the likelihoods according to the following set of recursions 
for 1 < i < I n (cf. Equation 8): 

£"(01*7, fn) = i?_i(0|77, ^„)TV t (0|77, (19) 

£?Ol^n) = + £?-i(s - ll^n^OU^J/for 

s — 1 , ... i — 1 , 

and L?(i \q,$ n ) = - l\rj, ^)7f(l|?7, f n ). 


At the end of the recursions the within-cluster summed score likelihoods will have been 
accumulated as L7 n (i |?7, ^ n ) = L n (s| 77, ^ n ) for s = 0 ,...,r n , where r n = — 1) is the 

maximum within-cluster summed score for item cluster n. Integrating out the dependence on <f n , 
the summed score likelihood as a function of 77 can be approximated with quadrature as 
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( 20 ) 


Q 

L n (s\ri) = j L n (s\p,^ n ) g n ^n)d^ n « ^ L n (s|r 7 ,X q )l^ n (^), 

<?=i 

where X q is a set of 2 rectangular quadrature points with weights I4/ n (A' (? ) = g n (X q )/ 
Y,q=i g-n (Xq)- At the end of the first stage, each of the N item clusters is characterized by a set of 
summed score likelihoods in terms of g. 


In the second stage, L n (s\g) is treated as though it is a category traceline of a polytomous 
item (with r n + 1 categories), and the Lord- Winger sky algorithm for polytomous item responses 
introduced in the Polytomous Item Responses section is directly applied. As before, let S n = 
Yg=i r j be the maximum summed score after item cluster n has been included in the recursions. 

To initialize, set the step 1 summed score likelihood to the summed score likelihoods from the 
first cluster, i.e., L 1 (s\g) — l}(s \g) for s = 0, In step n — 2, the summed score 

likelihoods from cluster n are added into the 5 n _ 1 available summed score likelihoods from the 
previous step: 


£nO|j?) = X X Ln_ 


■ 1 (.s m \g')L n (k\g') l s (s* + k), 


s*= 0 k = 0 


( 21 ) 


where l s (s* + k) is still an indicator function that takes on a value of 1 if and only if s is equal 
to s* + k, and 0 otherwise. Entirely analogous to Equation (13), the summation in Equation (21) 
is over the existing summed score likelihoods for scores s* = 0, ... ,S n _ 1 and the r n + 1 summed 
scores from item cluster n, while preserving the restriction that the combination must lead to a 
summed score of s. 


At the conclusion of step N , the likelihoods L N (s\g) are equal to the desired summed score 
likelihoods L(s\g) for each s. Recall that h(rj) is the density of the primary dimension. Posterior 
summaries for summed score s can be readily computed using quadrature from p(j]\s) — 
L(s\g)h(ji)/p(s), where the marginal probability p(s) = / L(s\g)h(g)dg can be approximated 
with Q-point rectangular quadrature as p(s) ~ Eq=i^(s|^q) W(X q ), with weights given by 
W(X q ) = h(X q )/'Z% =1 h(X q ). Posterior mean and variance can be obtained with similar 
quadrature computations. 

If there are more than one primary dimensions in the model or if any of the items are 
polytomous, the core structure of the algorithm remains the same. One would only have to 
replace the first-stage recursions in Equation (19) by computations similar to those defined in the 
Polytomous Item Responses section, and use direct product quadrature rules for integrals over the 
vector- valued rj. 
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An Illustrative Example 

Consider 6 hypothetical dichotomous items arranged in 3 doublets. There are 4 latent 
variables in this model, one primary dimension t] on which all items load and 3 specific 
dimensions <fi, Table 4 shows the item parameters for these items, as well as the bifactor 

structure wherein items 1-2, 3-4, and 5-6 form into three doublets with nonzero loadings on the 
specific dimensions. The prior distributions of the latent variables are taken to be standard 
normal. Table 5 shows the ordinates of the item response functions as well as quadrature weights 
for the specific dimensions over a 5 x 5 grid defined by the direct product of equally spaced 
quadrature points at -2, -1, 0, 1, and 2. Due to space constraints, only values at a selected subset 
of the grid points are shown in Table 5. The weights for specific dimensions are normalized 
ordinates of standard normal densities as functions of (q, <f 2 « and % 3 , and repeated over the 
quadrature points for rj. W 1 (^), W 2 (( 2 ), and W 3 (£ 3 ) are the same in this example because the 
prior distributions of <fi, ^ 2^3 are all standard normal (but they need not always be standardized, 
see e.g., Cai et al., 201 1). 


Table 4 

Item parameters for the 6 dichotomous items with hypothetical 
bifactor structure 


Item 

a° 

a 1 

a 2 

a 3 

c 

1 

1.2 

1.0 



- 1.0 

2 

1.2 

1.0 



-.6 

3 

1.0 


.8 


-.2 

4 

1.0 


.8 


.2 

5 

.8 



1.2 

.6 

6 

.8 



1.2 

1.0 
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Table 5 


Ordinates of item response functions and quadrature weights evaluated over the 5x5 direct product rectangular 
quadrature points for the 6 hypothetical items with bifactor structure 


Function 




Quadrature weights 




V 

-2 

-2 

-2 


0 

2 

2 

2 

fi 

-2 

-1 

0 


0 

0 

1 

2 

fz 

-2 

-1 

0 


0 

0 

1 

2 

fa 

-2 

-1 

0 


0 

0 

1 

2 

^llfl) = 

.054 

.244 

.403 


.403 

.403 

.244 

.054 

w 2 ({ 2 ) = 

.054 

.244 

.403 


.403 

.403 

.244 

.054 

Wstfs) = 

.054 

.244 

.403 


.403 

.403 

.244 

.054 

Item 1: Tl(l\ri, = 

.004 

.012 

.032 


.269 

.802 

.917 

.968 

Item 2: T}( l|?7,fi) = 

.007 

.018 

.047 


.354 

.858 

.943 

.978 

Item 3: l|? 7 ,f 2 ) = 

.022 

.047 

.100 


.450 

.858 

.931 

.968 

Item 4: T 2 2 (l|?h <f 2 ) = 

.032 

.069 

.142 


.550 

.900 

.953 

.978 

Item 5: T? (l|j/,f 3 ) = 

.032 

.100 

.269 


.646 

.900 

.968 

.990 

Item 6: T 2 3 (l|? 7 ,f 3 ) = 

.047 

.142 

.354 


.731 

.931 

.978 

.993 

Item 1: T^(0\r], = 

.996 

.988 

.968 


.731 

.198 

.083 

.032 

Item2:7’ 2 1 (0|t 7 ^ 1 ) = 

.993 

.982 

.953 


.646 

.142 

.057 

.022 

Item3:7 , 1 2 (0|ij,f 2 ) = 

.978 

.953 

.900 


.550 

.142 

.069 

.032 

Item 4: T 2 2 (0|?7, f 2 ) = 

.968 

.931 

.858 


.450 

.100 

.047 

.022 

Item 5: = 

.968 

.900 

.731 


.354 

.100 

.032 

.010 

Item 6: T 2 3 (0|?7,f 3 ) = 

.953 

.858 

.646 


.269 

.069 

.022 

.007 


Table 6 illustrates the first stage of the new recursive algorithm. In this case, summed score 
likelihoods are accumulated for each of the 3 item clusters. Within each item cluster, there are 
only two dichotomously scored items, so the summed scores range from 0 to 2. The summed 
score likelihoods are represented over separate grids formed by the direct product of the 
quadrature points for the primary dimension r\ crossed with <f x , <f 2 , and <f 3 , respectively. In Table 
7, the specific dimensions are integrated out for each item cluster. This leaves the summed score 
likelihoods as functions of the primary dimension r\ alone. 
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Table 6 


Accumulating summed score likelihoods within each item cluster 


Item cluster 



Likelihoods 





Initialize Cluster l’s summed : 

score likelihoods by adding Item 1 






Quadrature grid for (h; fi) 


Within-cluster 

V 

-2 

-2 

-2 

, 0 

• 2 2 

2 

Score likelihood 

fi 

-2 

-1 

0 

, 0 

• 0 1 

2 

L\{ 0|i?,fi) = 

^(Olh^i) 

.996 

.988 

0.968 •• 

.731 •• 

• .198 .083 

.032 

= 


.004 

.012 

0.032 

.269 •• 

• .802 .917 

.968 

Add Item 2 to Cluster l’s summed score likelihoods 

4(0|J7,fi) = 


.989 

.970 

.922 •• 

.472 •• 

• .028 .005 

.001 

4(ll7,fi) = 

+ L 1 1 (l| ?7 ^ 1 )r 2 1 (0|t7,fi) 

.011 

.030 

.077 •• 

.433 •• 

• .284 .131 

.053 

4(2|t7,fi) = 


.000 

.000 

.002 •• 

.095 •• 

• .688 .864 

.947 


Initialize Cluster 2’s summed : 

score likelihoods by adding Item 3 






Quadrature grid for (jj, <f 2 ) 


Within-cluster 

V 

-2 

-2 

-2 

• 0 

• 2 2 

2 

Score likelihood 

fz 

-2 

-1 

0 

• 0 

• 0 1 

2 

= 

7\ 2 ( 0 1 77 , fi) 

.978 

.953 

.90 •• 

.550 •• 

• .142 .069 

.032 

LlWV^2) = 


.022 

.047 

.10 •• 

.450 •• 

• .858 .931 

.968 

Add Item 4 to Cluster 2’s summed score likelihoods 

£ 2 2 (0|t7,f 2 ) = 

i 2 l(0|t?^2)7’2 2 (0lh^2) 

.947 

.887 

.773 •• 

.248 • • 

.014 .003 

.001 

i!(l| 7-fz) = 

i 2 i(0|i7,f2)r 2 2 Ci|i7,f2) 
+ L 2 1 (l|j7,f 2 )7’ 2 2 (0|?7,f2) 

.053 

.110 

.213 •• 

.505 •• 

.213 .110 

.053 

il(2|??,f 2 ) = 

i 2 i(Hh^2)7’ 2 2 (l|t?^2) 

.001 

.003 

.014 •• 

.248 • • 

.773 .887 

.947 

Initialize Cluster 3’s summed score likelihoods by adding Item 5 





Quadrature grid for (jj, <f 3 ) 


Within-cluster 

V 

-2 

-2 

-2 

• 0 

• 2 2 

2 

Score likelihood 

f 3 

-2 

-1 

0 

• 0 

• 0 1 

2 

i4(0|?U3) = 

^(Olij.fa) 

.968 

.900 

.731 •• 

.354 •• 

.100 .032 

.010 

14(11^3) = 


.032 

.100 

.269 •• 

.646 •• 

.900 .968 

.990 
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Item cluster 


Likelihoods 


Add Item 6 to cluster 3’s summed score likelihoods 


4(01^3) = 


.922 

.773 

.472 ••• 

.095 ••• 

.007 

.001 

.000 

iiato.fs) = 

i 3 i(0|i7,f3)72(l|i7,f 3 ) 

+ L 3 1 (l|t 7 ^3)T 2 3 (0|t 7 ^ 3 ) 

.077 

.213 

.433 ••• 

.433 ••• 

.155 

.053 

.017 

2|T7,f 3 ) = 


.002 

.014 

.195 ••• 

.472 ••• 

.838 

.947 

.983 


Table 7 

Integrating the specific dimensions out of the summed score likelihoods 


Dimensions 


Likelihoods 


Multiply Cluster l’s Summed Score Likelihoods by W 1 (^ 1 ) 

Quadrature Grid for (lj, 


V 

-2 

-2 

-2 

0 ••• 

2 

2 

2 


-2 

-1 

0 • 

0 ••• 

0 

1 

2 

LHOI^WiCfO = L!(0|i7,fi)Wi(fi) 

.054 

.237 

.371 

.190 ••• 

.011 

.001 

.000 


.001 

.007 

.031 

.174 ••• 

.114 

.032 

.003 

L 1 (2|t 7 ,f 1 )W 1 (f 1 ) = L 1 2 (2|t7,fi)W 1 (f 1 ) 

.000 

.000 

.001 

.038 ••• 

.277 

.211 

.052 

Summing over Leaving Cluster l’s Summed Score Likelihoods as Functions of r/ Only 

V 


-2 


-1 

0 

1 


2 

i 1 (0|t 7 )=2] fl i 1 (0|/7,fi)VF 1 (f 1 ) 

.891 


.728 

.469 

.212 


.062 


.103 


.235 

.382 

.411 


.288 

^(2117) =Z fl ^(21/1, 

.006 


.037 

.148 

.377 


.649 

Multiply Cluster 2’ 

s Summed Score Likelihoods by W 2 (( 2 ) 




Quadrature Grid for (ty, f 2 ) 

V 

-2 

-2 

-2 

0 ••• 

2 

2 

2 

f 2 

-2 

-1 

0 • 

0 ••• 

0 

1 

2 

L 2 (0|t7,f 2 )W 2 (f 2 ) = L 2 2 (0|t7^ 2 )I^ 2 (f 2 ) 

.052 

.217 

.311 

.100 ••• 

.006 

.001 

.000 

L 2 (l|t7,f 2 )Vl/ 2 (f 2 ) = L 2 2 (l|t7,f 2 )VK 2 (f 2 ) 

.003 

.027 

.086 • 

.203 ••• 

.086 

.027 

.003 

L 2 (2|/7,f 2 )Vl/ 2 (f 2 ) = L 2 (2|t7,f 2 )W 2 (f 2 ) 

.000 

.001 

.006 • 

.100 ••• 

.311 

.217 

.052 
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Dimensions 


Likelihoods 


Summing over Leaving Cluster 2’s Summed Score Likelihoods as Functions of rj Only 


V 


-2 


-1 

0 

1 


2 

i 2 (0|J7)=I f2 L 2 (0|r7,f 2 )lT 2 (f 2 ) 

.742 


.519 

.277 

.106 


.028 

L 2 {l\ri)=^ 2 L 2 ^\r 1 ^2)W 2 ^ 2 ) 

.230 


.375 

.446 

.375 


.230 

i 2 (2|r?)=I f2 L 2 (2| J 7,( : 2)IT 2 (f 2 ) 

.028 


.106 

.277 

.519 


.742 

Multiply Cluster 3’ 

s Summed Score Likelihoods by W 3 (<( 3 ) 




Quadrature Grid for (r/, <f 3 ) 

V 

-2 

-2 

-2 ■■ 

0 ••• 

2 

2 

2 

fa 

-2 

-1 

0 

0 ••• 

0 

1 

2 

£ 3 (0|t7,f 3 )Vk 3 (f 3 ) = i 3 2 (0|t7,f 3 )lT 3 (f 3 ) 

.050 

.189 

.190 • 

.038 ••• 

.003 

.000 

.000 

L 3 (l |t7,f 3 )W 3 (f 3 ) = L 3 2 (l|t7,f 3 )Vk 3 (f 3 ) 

.004 

.052 

.174 

.174 ••• 

.062 

.013 

.001 

L 3 (2 |t7,f 3 )W 3 (f 3 ) = L 3 2 (2|t7,f 3 )VK 3 (f 3 ) 

.000 

.003 

.038 • 

.190 ••• 

.337 

.231 

.054 


Summing over <f 3 . Leaving Cluster l’s Summed Score Likelihoods as Functions of r/ Only 


V 



-2 

-1 

0 

1 

2 

i 3 (0|t7)=I f3 L 3 (0|t7,f 3 )lT 3 ^ 3 ) 

.469 

.302 

.166 

.077 

.029 

i 3 (l|t 7 )=I f3 L3 ( 1 l^3)W^ 3 ) 

.364 

.396 

.364 

.285 

.192 

f' 3 (2|t 7 ) = L 3 (2|t7,f 3 )M7 3 (f 3 ) 

.166 

.302 

.469 

.638 

.779 


Finally, the accumulated summed score likelihoods in each item cluster are used in the 
second stage of the recursive algorithm, as shown in Table 8 . The within-cluster summed scores 
are treated as though they are item scores for 3 polytomous items. At the end of the recursions 
the final summed score likelihoods for the primary dimension r] are assembled and multiplied by 
the weights from the prior distribution of 17 , yielding posterior probabilities, expectations, and 
variances, as shown in Table 9. The entries under the heading Posterior Summaries form a 
summed score to IRT scaled score translation table (along with standard errors) for the primary 
dimension in an item bifactor model. 
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Table 8 


Forming summed score likelihoods for the primary dimension 


Summed score likelihoods 

V 

-2 

-1 

0 

1 

2 

Step 1 : Initialize summed score likelihoods by adding Item Cluster 1 

ii(0|»7) = 

12(0 It?) 

.891 

.728 

.469 

.212 

.062 

Li0-\V) = 

LHlIt?) 

.103 

.235 

.382 

.411 

.288 

£i(2|i7) = 

LHZIt?) 

.006 

.037 

.148 

.377 

.649 

Step 2: Add Item Cluster 2 to existing summed score likelihoods 

£ 2 (0|I7) = 

i 1 (0|f|)i 2 (0|j|) 

.661 

.378 

.130 

.022 

.002 

£ 2 (ll»7) = 

L 1 (0|j?)L 2 (1|j J ) + L 1 (1|??)L 2 (0|??) 

.281 

.395 

.315 

.123 

.022 

i 2 (. 2|I7) = 

Z, 1 (0|t7)k 2 (2|7?)+L 1 (l|/7)k 2 (l|t7) 

+ L 1 (2|t?)L 2 (0| J ?) 

.053 

.184 

.342 

.304 

.131 

i 2 (3|i/) = 

L 1 (l|t7)L 2 (2|r?)+L 1 (2|t7)L 2 (l|t7) 

.004 

.039 

.172 

.355 

.363 

i 2 (4|t?) = 

L 1 (2|t?)L 2 (2|t?) 

.000 

.004 

.041 

.196 

.482 

Step 3: 

Add Item Cluster 3 to existing summed score likelihoods 




£ 3 (0|i7) = 

L 2 (0|t?)L 3 (0|t?) 

.310 

.114 

.022 

.002 

.000 

= 

i 2 (0|»|)Z. 3 (l|j|) + i 2 (l|j|)i 3 (0|f|) 

.373 

.269 

.100 

.016 

.001 

i 3 (2|i7) = 

L 2 (0|t7)L 3 (2| J? ) + L 2 (l|t7)L 3 (l|t7) 
+ L 2 (2|t?)L 3 (0|t?) 

.237 

.326 

.233 

.073 

.010 

i 3 ( 3|i7) = 

L 2 (1| J 7)L 3 (2| J 7) + L 2 (2| J? )L 3 (1| J7 ) 
+ L 2 (3|t?)L 3 (0|t?) 

.068 

.204 

.301 

.192 

.053 

i 3 (4|J7) = 

L 2 (2| J 7)L 3 (2| J 7) + L 2 (3|t 7 )L 3 (l| J ?) 

+ L 2 (4|t?)L 3 (0|t?) 

.010 

.072 

.230 

.310 

.186 

i 3 (5|i7) = 

L 2 (3| J 7)L 3 (2|r7) + L 2 (4|t7)L 3 (l| J ?) 

.001 

.013 

.096 

.282 

.375 

i 3 (6|J7) = 

L 2 (4|t7)L 3 (2|t?) 

.000 

.001 

.019 

.125 

.375 
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Table 9 


Characterizing the summed score likelihoods and posteriors for the primary dimension 


Item cluster 



Likelihoods 



Posterior summaries 

Quadrature 



V 






Weights at 

-2 

-1 

0 

1 

2 




w 0 ?) = 

.054 

.244 

.403 

.244 

.054 




Summed score 



V 






Likelihoods L(s I 77 ) at 

-2 

-1 

0 

1 

2 




i(0|J7) = 

.310 

.114 

.022 

.002 

.000 




UH 11 ) = 

.373 

.269 

.100 

.016 

.001 




£(2|i7) = 

.237 

.326 

.233 

.073 

.010 




L(3|t7) = 

.068 

.204 

.301 

.192 

.053 




L(4|?7) = 

.010 

.072 

.230 

.310 

.186 




£(5|i7) = 

.001 

.013 

.096 

.282 

.375 




£(6 |t?) = 

.000 

.001 

.019 

.125 

.375 




Unnormalized summed 



V 



Posterior summaries 

Score posteriors p(p|s) at 

-2 

-1 

0 

1 

2 

p00 

#07 Is) 

V(j]\ 5 ) 

p (77 1 0) oc L( 0 |p)kk(j 7 ) = 

.017 

.028 

.009 

.000 

.000 

.05 

-1.14 

.49 

P0l 11) K i(l|t7)Vk(t?) = 

.020 

.066 

.040 

.004 

.000 

.13 

-.79 

.54 

P(t?|2) oc k(2|??)M/ (??) = 

.013 

.080 

.094 

.018 

.001 

.20 

-.42 

.56 

P(t?|3) oc L(3\rj)W (rj) = 

.004 

.050 

.121 

.047 

.003 

.22 

-.02 

.55 

p ( 77 1 4) oc L(4\ri)W (ji) = 

.001 

.018 

.093 

.076 

.010 

.20 

.39 

.54 

P(t?|5) oc L(S\rj)W(rj) = 

.000 

.003 

.039 

.069 

.020 

.13 

.81 

.52 

p(p|6) oc L(6|p)kk(j?) = 

.000 

.000 

.008 

.030 

.020 

.06 

1.21 

.46 


Some Additional Comparisons 

Without the updated Lord-Wingersky algorithm, it may be tempting in practice to calibrate 
a test using a hierarchical item factor model (e.g., testlet model) to “handle” residual dependence, 
retain the general dimension slopes, and create a summed score to scaled score conversion table 
with the original unidimensional Lord-Wingersky algorithm. While this approach has a certain 
intuitive appeal, and the computation is simpler than the updated Lord-Wingersky algorithm, it is 
nevertheless going to lead to incorrect results. Failing to take into account the influence of 
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residual dependence (as indicated by the presence of specific dimensions) in IRT scoring can still 
lead to an overstatement of the degree of reliability of the instrument. Recent work by Ip (2010a, 
2010b), and Stucky, Thissen, and Edelen (2013) also highlight the effects residual dependence 
has on scaled scores and standard errors. 


Table 10 

Summed score to scaled score conversions based 
on primary dimension slopes only 


Summed 

scores 

Posterior summaries 

p(s) 

E(j]\s) 

V(j}\ s ) 

5 = 0 

0.05 

-1.29 

0.40 

5 = 1 

0.13 

-0.90 

0.46 

5 = 2 

0.20 

-0.47 

0.46 

5 = 3 

0.22 

-0.03 

0.44 

5 = 4 

0.20 

0.42 

0.44 

5 = 5 

0.14 

0.89 

0.43 

5 = 6 

0.06 

1.33 

0.37 


Notably, the marginal reliability coefficient can become substantially overestimated. In the 
case of the illustrative example presented in the An Illustrative Example section, er 2 ( 77 ) is equal 
to 1 because the prior h(jj) is standard normal. Applying Equation (12) to results in Table 9, the 
marginal reliability of the scaled scores for the primary dimension p is equal to 0.47. On the 
other hand, if only the general dimension slopes in Table 4 are retained and standard Lord- 
Wingersky algorithm is applied to obtain a one-dimensional summed score conversion table (as 
shown in Table 10), the marginal reliability of the scaled scores for summed scores becomes 
0.56, an almost 20% upward bias relative to the reliability estimate from the more appropriate 
scoring method. 

Furthermore, the estimates of scaled scores are also impacted. A comparison between 
Tables 9 and 10 shows that the posterior means become more extreme in general when the 
specific dimension slopes are ignored and the unidimensional scoring algorithm used. This is 
natural since the item intercepts and slopes are unstandardized parameters. When the (typically 
positive) specific dimension slopes are ignored and the intercepts remain untouched, the implied 
standardized threshold parameters becomes more extreme, leading to posteriors that are 
positioned more toward the extreme ends of the latent trait scale. 
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Additional Applications 


Besides summed score based IRT scoring tables, the updated Lord-Wingersky algorithm 
can be applied creatively to solve a test linking problem (see Thissen et al., 2011), to create score 
combination tables for mixed format tests, and to construct model fit test statistics. Discussed in 
this section are only selections of the new possibilities opened up by the updated algorithm. 

Calibrated Projection Linking 

Thissen et al. (2011) described a novel test linking method called calibrated projection that 
fuses simultaneous calibration with projection linking. The main advantage of calibrated 
projection is its ability to link two closely related (though not conceptually identical) scales in a 
single step that is entirely based on multidimensional IRT calibration. Thissen et al. (2011) 
illustrated the application of calibrated projection in health outcomes research, wherein a legacy 
instrument (PedsQL™ Asthma Symtoms Module) was projection linked onto the scale of the 
new Pediatric Asthma Impact Scale (PAIS). PAIS was built with IRT methods, whereas 
PedsQL™ was built with classical test theory methods, thus requiring the use of summed 
scoring. Producing a scoring cross-walk would enable the clinicians and researchers who already 
use PedQL™ to report scaled scores comparable to PAIS. 

As illustrated by Thissen et al.’s (2011) Tables 2 and 3, both instruments use 5-point 
ordered response scales suitable for the graded response model and each may be considered 
approximately unidimensional. PedsQL™ Asthma Symptoms Module contains 1 1 items and 
PAIS has 17. A multitude of additional differences between the two instruments implies that the 
more stringent requirements of concurrent calibration (e.g., equal construct) are probably not 
satisfied. Hence the weaker prediction/projection methods must be employed. 

At the core of calibrated projection linking is a multidimensional IRT model that has at 
least 2 correlated primary dimensions (% and rj 2 ), each measured by the respective instrument 
(PAIS and PedsQL™) with an independent cluster factor pattern. The correlation between r] t and 
p 2 is estimated simultaneously with the item parameters. The multidimensional IRT model then 
produces scores (projected through the correlation) on the scale of one instrument (PAIS in this 
case) using only the responses to items from the other instrument (PedsQL™ Asthma Symptoms 
Module). This model, when depicted in a graph, resembles the bottom half of the path diagram 
shown in Figure 2. 
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Figure 2. Path diagram of a two-tier model for calibrated projection linking. The two primary dimensions are 
correlated at .96 and there are 6 item doublets. 


However, when the two instruments were considered together, strong local dependence 
emerged among 6 pairs of items. As it turns out, these 6 pairs of items have stem wording that 
are virtually identical. For example, item 13 of PAIS reads “I had asthma attacks,” and item 3 of 
PedsQL™ Asthma Symptoms Module reads “I have asthma attacks.” The 6 items in fact 
represent some of the best symptoms that are indicative of asthma’s impact. Consequently, 
Thissen et al. (2011) suggested including 6 orthogonal latent variables to account for the effects 
of local dependence. This model is depicted in Figure 2. It is formally a two-tier model with 
M = 2 primary dimensions and N = 6 specific dimensions. The two primary dimensions are 
assumed to be bivariate normal, standardized in each dimension, with an unknown correlation 
coefficient. Thissen et al. (2011) obtained a linking sample and estimated the correlation 
coefficient (r = 0.96) as well as the item parameters for both instruments. 
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Table 1 1 

Item parameters for the 1 1 PedsQL™ items as input into the Lord-Wingersky 2.0 algorithm 


Item 




Slopes 





Intercepts 


Vi 

??2 

fi 

Q 

f 3 

?4 

fs 

& 

1 

2 

3 

4 

1 

0 

2.31 

0 

0 

0 

0 

0 

0 

0.77 

-0.56 

-3.19 

-5.50 

2 

0 

3.90 

0 

2.37 

0 

0 

0 

0 

1.50 

-1.05 

-5.83 

-8.24 

3 

0 

4.09 

3.85 

0 

0 

0 

0 

0 

-2.04 

-4.89 

-9.10 

-12.15 

4 

0 

1.70 

0 

0 

0 

0 

0 

0 

-0.48 

-1.20 

-2.84 

-3.68 

5 

0 

2.25 

0 

0 

0 

0 

0 

0 

2.05 

0.69 

-2.14 

-3.82 

6 

0 

2.63 

0 

0 

0 

0 

0 

2.52 

4.44 

2.17 

-1.70 

-4.08 

7 

0 

3.42 

0 

0 

2.04 

0 

0 

0 

1.79 

-0.65 

-4.59 

-7.02 

8 

0 

1.07 

0 

0 

0 

0 

0 

0 

1.64 

0.55 

-1.29 

-2.29 

9 

0 

3.11 

0 

0 

0 

0 

1.66 

0 

-0.17 

-1.88 

-4.11 

-5.82 

10 

0 

3.36 

0 

0 

0 

4.06 

0 

0 

-1.91 

-4.02 

-7.34 

-9.21 

11 

0 

2.19 

0 

0 

0 

0 

0 

0 

0.14 

-1.18 

-3.44 

-5.02 


Retaining the item parameters for PedsQL™ reported in Thissen et al. (2011), it is 
straightforward to apply the updated Lord-Wingersky algorithm. Table 11 shows the item 
parameters for the 11 PedsQL™ items. The slopes on the first general dimension r ; 1? 
representing PAIS, are all equal to zero here, indicating the absence of items that cross-load on 
both dimensions. The PAIS item slopes do not enter into the projection linking computations 
because only items from PedsQL™ are considered (along with the 0.96 prior correlation). The 
non-zero slopes for the 6 specific dimensions (<f x to Qj are what remain of the item doublet 
slopes after removing their counterparts among the PAIS items. 
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PedsQL Asthma Symptoms Module (r| 2 ) 

Figure 5. Bivariate contour plots showing 3 selected summed score posteriors for PedsQL™ Asthma Symptoms 
Module as well as the projected posteriors on the PAIS scale. 

For each summed score (s = 0, ... ,44) on PedsQL™, the recursive algorithm produces a 
bivariate posterior for r] 1 and r] 2 . Figure 3 shows the bivariate normal approximations to 3 
selected posteriors, for summed scores 0, 20, and 44, overlaid on the gray contours representing 
the bivariate normal prior with an estimated correlation of 0.96. The x-axis of Figure 3 represents 
the PedsQL™ latent variable whereas the y-axis represents PAIS (jh), consistent with the 
notation in Figure 2. The marginal posteriors are also plotted, indicating that entire summed 
score posteriors are projected through the bivariate relation between rj 1 and r] 2 . The marginal 
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posteriors on the y-axis are of key interest. Their relative sizes indicate the model-implied 
summed score proportions. Their means and variances become scores and error variances on the 
scale of PAIS for each PedsQL™ summed score, corrected for local dependence. 

Score Combination 

Modern educational assessments are often made up of items of varying types. For instance, 
a test may consist of traditional multiple-choice (MC) items that are dichotomously scored, for 
which the classical 3-parameter IRT model may be useful, as well as items that require judge- 
rated constructed responses (CR) or performance tasks that are subsequently analyzed using the 
graded response model (Samejima, 1969) or the generalized partial credit model (Muraki, 1992). 
When the MC items and the CR items measure the same latent construct and the test is 
approximately unidimensional, reporting a single combined score is a sensible approach. Rosa et 
al. (2001) proposed a score combination method that is based on the pattern of summed scores 
from the MC and CR sections. This is a convenient and practical approximation to the optimal 
(but more involved) scoring with the full response pattern. 

Specifically, let the summed score likelihoods for the MC section be L MC {s\6), and 
s = 0 ,...,S MC , where S MC is the maximum summed score for the MC section. Similarly, let 
L C r(s\9),s — 0 ,...,S CR denote the summed score likelihoods for the CR section. Rosa et al. 
(2001) states that following summed score pattern posterior provides a basis for combining MC 
section score with CR section score s 2 : 

. _ L M c ( 5 il 9)L cr (s 2 \8)g(8) (22) 

/ L mc ( s 1 \e)L CR (s 2 \e)g(e)de ' 

To compute the posterior, Rosa et al. (2001) noted that one would have to apply the standard 
Lord-Wingersky algorithm to the two sections separately and then explicitly use Equation (22) to 
construct a two-way look-up table for each of the summed score patterns. 

If one regards the MC section as a testlet, and the CR section as another one, one may 
choose to rewrite Equation (22) as: 

(01s sj— / L M c(. s i\Q^9i(£i)d^i f L CR (s 2 l0)g 2 (( 2 )d^ 2 (23) 

/ / LMc( s i\@)&i(fi)dfi / LcR(s 2 \8)g 2 (f; 2 )cU; 2 h(8)dd 

Note that the key condition for p(d\s 1 ,s 2 ) in Equation (22) to be the same as Equation (23) is: 
L MC (s i|0, £i) = FmcCsiI^) and L, C r(s 2 \8,^ 2 ) — L CR (s 2 \8). In other word, the two are the same 
when items in both MC and CR sections do not depend on the specific dimensions ^ and ^ 2 ; or, 
alternatively, when the item slopes on and are all equal to zero. The equivalence suggests 
that one does not need a specialized algorithm for implementing Rosa et al.’s (2001) scoring 
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combination method. One would simply have to set up a special bifactor model wherein all 
specific dimension slopes are constrained to zero and apply Version 2.0 of the Lord-Wingersky 
algorithm (outlined in this paper’s Lord-Wingersky Algorithm Version 2.0 section) to this 
bifactor model. Although the specific dimension slopes may be zero, the presence of the testlet 
structure enables the first stage of the updated Lord-Wingersky algorithm to accumulate the 
within-section summed score likelihoods separately. Instead of collapsing the section- specific 
summed scores as per Equation (21), the pattern of summed scores is used to compute a posterior 
for the primary dimension directly. 


Table 12 

Item parameters for the 20 Wisconsin 3rd grade reading items as input into the Lord- 
Wingersky 2.0 algorithm 


Multiple-choice items (3PL Model) 



Slopes 




Item 

9 

fi 

^2 

Intercept 

Guessing 

1 

1.02 

0 

0 

0.72 

0.20 

2 

2.16 

0 

0 

2.99 

0.31 

3 

2.29 

0 

0 

2.72 

0.22 

4 

1.47 

0 

0 

1.37 

0.23 

5 

2.29 

0 

0 

0.92 

0.23 

6 

3.61 

0 

0 

1.83 

0.19 

7 

2.05 

0 

0 

1.12 

0.23 

8 

2.60 

0 

0 

3.36 

0.28 

9 

1.47 

0 

0 

1.36 

0.20 

10 

2.76 

0 

0 

1.68 

0.18 

11 

1.88 

0 

0 

1.84 

0.22 

12 

2.27 

0 

0 

0.84 

0.28 

13 

1.46 

0 

0 

1.11 

0.20 

14 

3.9 

0 

0 

1.81 

0.25 

15 

1.56 

0 

0 

0.14 

0.26 

16 

1.62 

0 

0 

2.02 

0.21 
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Rated constructed response items (Graded Model) 



Slopes 



Intercepts 


Item 

9 

fi 

f 2 

1 

2 

3 

1 

0.87 

0 

0 

4.29 

2.48 

-1.01 

2 

0.93 

0 

0 

4.15 

1.33 

-1.06 

3 

1.31 

0 

0 

4.47 

2.31 

0.69 

4 

0.73 

0 

0 

4.05 

1.27 

-1.63 


Table 13 

Summed score combination table computed by the updated recursive algorithm for the Wisconsin reading items 


Summed Summed rated score for CR items 

score for 

MC 


items 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

0 

-3.28 

-3.05 

-2.84 

-2.66 

-2.50 

-2.35 

-2.22 

-2.11 

-2.01 

-1.92 

-1.85 

-1.79 

-1.73 

1 

-3.23 

-2.98 

-2.77 

-2.58 

-2.42 

-2.27 

-2.13 

-2.01 

-1.91 

-1.82 

-1.75 

-1.68 

-1.62 

2 

-3.17 

-2.91 

-2.69 

-2.50 

-2.32 

-2.17 

-2.03 

-1.91 

-1.80 

-1.71 

-1.63 

-1.57 

-1.51 

3 

-3.10 

-2.83 

-2.59 

-2.39 

-2.22 

-2.06 

-1.92 

-1.79 

-1.68 

-1.59 

-1.51 

-1.45 

-1.38 

4 

-3.01 

-2.72 

-2.48 

-2.27 

-2.09 

-1.93 

-1.79 

-1.66 

-1.56 

-1.46 

-1.38 

-1.32 

-1.25 

5 

-2.90 

-2.59 

-2.34 

-2.12 

-1.94 

-1.78 

-1.64 

-1.52 

-1.42 

-1.33 

-1.25 

-1.18 

-1.12 

6 

-2.75 

-2.43 

-2.16 

-1.95 

-1.77 

-1.62 

-1.49 

-1.37 

-1.27 

-1.19 

-1.11 

-1.05 

-0.99 

7 

-2.55 

-2.21 

-1.95 

-1.75 

-1.59 

-1.45 

-1.33 

-1.22 

-1.13 

-1.05 

-0.98 

-0.91 

-0.86 

8 

-2.29 

-1.95 

-1.71 

-1.53 

-1.39 

-1.27 

-1.16 

-1.07 

-0.98 

-0.90 

-0.83 

-0.77 

-0.72 

9 

-1.94 

-1.64 

-1.44 

-1.30 

-1.18 

-1.08 

-0.99 

-0.91 

-0.83 

-0.76 

-0.69 

-0.63 

-0.57 

10 

-1.54 

-1.32 

-1.18 

-1.07 

-0.98 

-0.90 

-0.82 

-0.75 

-0.67 

-0.60 

-0.53 

-0.47 

-0.41 

11 

-1.15 

-1.02 

-0.93 

-0.85 

-0.78 

-0.72 

-0.65 

-0.58 

-0.51 

-0.44 

-0.37 

-0.30 

-0.23 

12 

-0.83 

-0.76 

-0.70 

-0.65 

-0.59 

-0.53 

-0.47 

-0.40 

-0.33 

-0.25 

-0.18 

-0.09 

-0.01 

13 

-0.57 

-0.53 

-0.49 

-0.44 

-0.39 

-0.34 

-0.28 

-0.21 

-0.13 

-0.05 

0.05 

0.16 

0.27 

14 

-0.33 

-0.30 

-0.27 

-0.23 

-0.18 

-0.13 

-0.07 

0.01 

0.10 

0.20 

0.33 

0.47 

0.63 

15 

-0.10 

-0.08 

-0.04 

0.00 

0.05 

0.11 

0.18 

0.27 

0.38 

0.51 

0.67 

0.87 

1.11 

16 

0.15 

0.18 

0.21 

0.26 

0.32 

0.39 

0.48 

0.59 

0.72 

0.89 

1.11 

1.37 

1.70 


As a concrete example, consider the Wisconsin 3rd grade reading assessment items 
discussed in Rosa et al. (2001). There are altogether 20 items, 16 in the MC section (scored 0-1) 
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and 4 in the CR section (each has 4 score points). Using the item parameters reported by Thissen 
and Wainer (2001), one may set up a bifactor model with two empty specific dimensions (as 
shown in Table 12). Application of the updated Lord-Wingersky algorithm to the model in Table 
12 leads to a two-way table (Table 13) that (almost) reproduces Table 7.2 (p. 259) in Rosa et al. 
(2001) with any difference attributable to limited number of significant digits in the reported 
item parameters and numerical quadrature error. 

While the foregoing may be deemed a convenient trick for tests that are unidimensional, it 
does offer a degree of generality that Rosa et al.’s (2001) original method did not possess. That 
is, when the MC or CR sections demonstrate departures from unidimensionality (e.g., when there 
is testing mode effect for the CR items, and the specific slopes may not be exactly zero), the new 
algorithm will properly adjust the combined scaled score for residual dependence, requiring no 
new specialized implementation. 

Model Fit Evaluation 


As soon as summed score probabilities can be evaluated for unidimensional IRT models, 
researchers have explored their use in model fit diagnosis. Orlando and Thissen’s (2000) 
summed score likelihood based item fit statistic is one prominent example. Consider item 
i = 1 , ...,/ with K t categories. Recall that the maximum summed score is still S — £{ =1 (/Q — 1). 
One may compute the “rest score” likelihoods, i.e., the summed score likelihoods based on all 
items except i. Let L(q(s|0), s — 0 , ... ,S — — 1), denote the rest score likelihoods for item i. 

For this item, the posterior probability for category k in rest score group 5 is 

Pt/cO) = f L {0 (s\e)T i (k\e)g(e)de. (24) 


The posterior probability for rest score group 5 is 

P( 0 0) = f L (i) (s\6)g(6)d6. 


(25) 


Therefore the model-implied probability of endorsing category k if the rest score is s can be 
computed as E ik (s) = P[ k ( s )/P ([)($)■ The observed probability of endorsing category k if the 
rest score is s can be found by tabulating the calibration data. Let it be denoted as O ik (s). 
Orlando and Thissen (2000) noted that a Pearson-type statistic may be constructed as follows: 

s-(Kr-i) Ki-1 _ .2 (26) 

S Samp, e Size x J 0 m (s) £ ^ _ “Jy 


S — 0 


k = 0 


where 0 (q(s) is the observed counterpart to P( L) (s) . They presented simulation evidence that the 
large sample distribution of S — X? (at least in the dichotomous case) can be well approximated 
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by a central chi-square distribution with S — (K L — 1) — q L degrees-of-freedom, where q L is the 
number of freely estimated item parameters for item i. 

With the updated Lord-Wingersky algorithm, it is straightforward to generalize S — X 2 to 
hierarchical item factor models. Some additional book-keeping is necessary, however, to fully 
utilize dimension reduction. Consider item i in cluster/testlet n. Let L( n )(s\q) denote the 
summed score likelihoods in terms of the primary dimensions 17 , accumulated over all item 
clusters other than cluster n. ^( n )C s 'lp) is straightforward to compute by ignoring cluster n after 
stage 1 of the recursions is completed. Recall that r n is the maximum within cluster score for 
cluster n. L( n ^(s\ri) is defined for s = 0, ... ,S — r n . Within cluster n, the summed score 
likelihoods without item i is denoted L^) (s | q, f n ) . Note that the dependence on specific 
dimension is not yet integrated out of the likelihood, and L^\ ) (s\rj, ^ n ) is defined for s = 
0 , —T n — {Ki — 1 ). 

The posterior probability for category k in rest score group s is 

s~ r n r n -(rq-i) (27) 

Pi/cOO = f ^ kn)( s M f ^ L ( (n)(s 2 \ri^n) 1 sOi 

S 1 = 0 S 2 =0 

+ s 2 )Ti(k\ri, ^n)9n^n)d^ n h(T])dT], 

where + s 2 ) is an indicator function that is equal to 1 if and only if s = .sy + s 2 , and 0 

otherwise. The inner summation is needed because it combines likelihoods from cluster n while 
enforcing the constraint that the rest score must be s, before the dependence on specific 
dimension n is integrated out. By analogy, the posterior probability for the rest score group 5 is 
s-r n r n -(/q-i) (28) 

P(i)00 = II L(n)(si\r?) f ^ ^ ( (n)(s 2 l V'tn) l s Oi + s 2 )g n ^ n )d^ n h(j])dr]. 

S 1 = 0 s 2 = 0 

Once the posterior probabilities are computed, they can be inserted into Equation (26) to evaluate 
a chi-square test statistic for item i. Li and Rupp (2011) examined a version of this index by 
simulation but did not discuss the recursive algorithm that is needed to compute S — X 2 for 
hierarchical item factor models in full generality. 

Finally, the model implied summed score probabilities themselves, when compared against 
the observed probabilities, may be useful for diagnosing the ubiquitous latent variable normality 
assumption for the primary dimension in a testlet or bifactor model. While the idea itself is not 
new (see Ferrando & Lorenzo-seva, 2001; Hambleton & Traub, 1973; Lord, 1953; Ross, 1966; 
Sinharay, Johnson, & Stern, 2006), its use in hierarchical item factor models does require the 
new Lord-Wingersky algorithm (Li & Cai, 2012). 
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Discussion 


Hierarchical item factor models can relax some of the restrictive assumptions of 
unidimensional IRT models and they have been suggested as useful tools for educational and 
psychological measurement research and practice (Reise, 2012) in that they may better reflect the 
structure of measurement instruments. Their mathematical complexity, however, makes their 
routine use unrealistic. Importantly, scoring tests with bifactor/testlet/two-tier models can be 
computational involving and specialized software programs are required. Utilizing dimension 
reduction, an updated Lord-Wingerksy algorithm is proposed in this paper. This algorithm is 
computationally efficient even under a large number of latent factors. 

With the updated Lord-Wingersky algorithm, one may adopt a hierarchical item factor 
model in the test calibration stage and produce summed score conversions that are as convenient 
to use in practical settings as the original Lord-Wingersky method. The conversion tables are 
properly adjusted for the effects of residual dependence. To the end-user, the conversion tables 
eliminated the scoring complexities associated with the adoption of a multidimensional 
measurement model. Once the table is assembled, no specialized software is necessary for the 
end-user to reap the benefits of hierarchical multidimensional IRT modeling, thereby eliminating 
one of the key barriers to more wide-spread applications of hierarchical item factor models. In 
addition, the new algorithm serves as the basis of new test linking methods (calibrated 
projection), encompass traditional score combination approaches, and lead to new model fit 
diagnostic statistics. 
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